Skip to content

feat: add mask support to StatisticsCollector#1665

Closed
DrakeLin wants to merge 1 commit intodelta-io:mainfrom
DrakeLin:stack/stats-collector-mask
Closed

feat: add mask support to StatisticsCollector#1665
DrakeLin wants to merge 1 commit intodelta-io:mainfrom
DrakeLin:stack/stats-collector-mask

Conversation

@DrakeLin
Copy link
Collaborator

@DrakeLin DrakeLin commented Jan 23, 2026

🥞 Stacked PR

Use this link to review incremental changes.


  • Add NullBuffer mask parameter to update()
  • Only count masked-in rows for numRecords
  • Only count nulls in masked-in rows for nullCount
  • Filter column by mask before computing min/max
  • Tests for mask behavior with min/max and null counting

This enables deletion vector support where masked-out rows
should not contribute to file statistics.

What changes are proposed in this pull request?

How was this change tested?

@DrakeLin DrakeLin force-pushed the stack/stats-collector-mask branch 2 times, most recently from c67b904 to 0a9da9c Compare January 23, 2026 06:39
@github-actions github-actions bot added the breaking-change Public API change that could cause downstream compilation failures. Requires a major version bump. label Jan 23, 2026
@DrakeLin DrakeLin force-pushed the stack/stats-collector-mask branch from 0a9da9c to 397df67 Compare January 23, 2026 07:01
@DrakeLin DrakeLin force-pushed the stack/stats-collector-mask branch 4 times, most recently from 8b44391 to 37a216b Compare January 23, 2026 07:45
@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

❌ Patch coverage is 70.96774% with 270 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.28%. Comparing base (d4ecc0a) to head (fc10d58).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/default/stats.rs 67.76% 228 Missing and 26 partials ⚠️
kernel/src/transaction/mod.rs 61.90% 8 Missing ⚠️
kernel/src/table_configuration.rs 50.00% 5 Missing ⚠️
kernel/src/scan/data_skipping/stats_schema.rs 97.40% 1 Missing and 1 partial ⚠️
kernel/src/snapshot.rs 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1665      +/-   ##
==========================================
- Coverage   84.65%   84.28%   -0.37%     
==========================================
  Files         123      125       +2     
  Lines       34109    35333    +1224     
  Branches    34109    35333    +1224     
==========================================
+ Hits        28875    29781     +906     
- Misses       3905     4186     +281     
- Partials     1329     1366      +37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@DrakeLin DrakeLin force-pushed the stack/stats-collector-mask branch 8 times, most recently from 226bff5 to fc10d58 Compare January 23, 2026 23:29
- Add NullBuffer mask parameter to update()
- Only count masked-in rows for numRecords
- Only count nulls in masked-in rows for nullCount
- Filter column by mask before computing min/max
- Tests for mask behavior with min/max and null counting

This enables deletion vector support where masked-out rows
should not contribute to file statistics.
@DrakeLin DrakeLin force-pushed the stack/stats-collector-mask branch from fc10d58 to 33ec3ab Compare February 5, 2026 01:12
@DrakeLin DrakeLin closed this Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Public API change that could cause downstream compilation failures. Requires a major version bump.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant